CLAIMS 

1 . A method for modeling an information request, comprising: 

receiving unlabeled and labeled documents; 

extracting a set of features from each document; 
5 learning a model from example documents marked as positive or negative with respect to 

said request wherein said model scores documents to evaluate a degree of membership in a 
group responsive to said request; 

evaluating the performance of model settings on example documents; 

applying an adjustment algorithm that provides a threshold value 6Uvi 
10 applying a scoring function that computes score, a value assigned to a document by the 

learnt model, and classifies the document based on the sign of the following equation: 

Class(X) = $ign(score - O^J) 

2. The method according to claim 1, wherein: 

15 said model is a support vector machine determined by training data from labeled example 

documents. 

3. The method according to claim 1, wherein: 

said model comprises a list of terms and weights extracted from labeled example 
documents. 

20 4. The method according to claim 1 , wherein: 

said model comprises a list of terms and corresponding weights and a threshold value 
determined by: 

extracting terms and features from the positive and negative documents; 
ranking terms and features; 
25 selecting a subset of terms and features from the ranked terms and features; 

assigning a weight w f for each term and feature; 
setting a threshold 6 for the model to zero. 
5. The method according to claim 4, wherein: 
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said terms and features are ranked in decreasing order of their Rocchio score calculated 
as follows: 



TFi Jext \ The frequency of feature in the text description of the information need 
TFjf The number of occurences of feature in document ; 
Dr. Positive document set 
D N : Negative document set 

R: the number of positive documents (i.e., the size of D R ) 
N: the number of negative documents 
6. The method according to claim 4, wherein: 

said terms and features are assigned a weight as follows: 

w i = Rocchio i • idfi , calculated as: 



TFjj ex t: The frequency of feature in the text description of the information need 
TFf The number of occurences of feature in document / 
Dr. Positive document set 
D N : Negative document set 

R: the number of positive documents (i.e., the size of D R ) 
N: the number of negative documents 
and idfj is calculated as follows: 

idfj = log 2 (S/nJ + 1 




in which 




where 
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where S is the count of documents in the set and n, is the count of the documents in which i 
feature occurs. 

7. A system for filtering documents, comprising: 

a computer coupled to a network wherein said computer receives documents over said 
network and transmits documents to an individual user over said network, wherein said computer: 
receiving unlabeled and labeled documents; 
extracting a set of features from each document; 

learning a model from the example documents marked as positive or negative with 
respect to a category wherein said model scores documents to evaluate a degree of membership 
in said category; 

evaluating the performance of model settings on example documents; 
applying an adjustment algorithm that provides a threshold value 0 new ; 
applying a scoring function that computes score, a value assigned to a document by the 
learnt model, and classifies the document based on the sign of the following equation: 

Class(X) = Sign(score - 0 new ) 

8. A system as in claim 7, wherein: 

said model is a support vector machine determined by training data from labeled example 
documents. 

9. A system as in claim 7, wherein: 

said model comprises a list of terms and weights extracted from labeled example 
documents. 

10. The system of claim 7, wherein: 

said model comprises a list of terms and corresponding weights and a threshold value 
determined by: 

extracting terms and features from the positive and negative documents; 
ranking terms and features; 

selecting a subset of terms and features from the ranked terms and features; 
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10 



15 



20 



assigning a weight w, for each term and feature; 

setting a threshold 0 for the model to zero. 
1 1 The system of claim 10, wherein: 

said terms and features are ranked in decreasing order of their Rocchio score calculated 
as follows: 

f + \ f * \ 



Rocchio^ = TF iText + a 



\^ DjeDx&WjeDj J \ iy DjeD^&w^Di J 



in which 

TF itText : The frequency of feature in the text description of the information need 
TF t f. The number of occurences of feature in document / 
D R : Positive document set 
D N \ Negative document set 

R\ the number of positive documents (i.e., the size of D R ) 
N: the number of negative documents 
12. The system of claim 10, wherein: 

said terms and features are assigned a weight as follows: 

w s = Rocchio i • idf x , calculated as: 



Rocchio >. = TF iText + a 



- Ytk -p- Ytk 

J? " 3 M ^ J 

DjeDR&WiZDj J \ iy DjeDtf&WjeDj ) 



where 



TF iJext : The frequency of feature in the text description of the information need 
TFf The number of occurences of feature in document j 
Dr. Positive document set 
D N : Negative document set 

R: the number of positive documents (i.e., the size of D R ) 
N: the number of negative documents 



25 and idf f is calculated as follows: 
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idfi = log 2 (N/nj) + 1 

where N is the count of documents in the set and n f is the count of the documents in which i th 
feature occurs. 

13. A method for retrieving information in response to a request, comprising: 
5 receiving unlabeled and labeled documents; 

extracting a set of features from each document; 

learning a model from example documents marked as positive or negative with respect to 
said request wherein said model scores documents to evaluate a degree of membership in a 
group responsive to said request; 
10 evaluating the performance of model settings on example documents; 

applying an adjustment algorithm that provides a threshold value 0 new ; 

applying a scoring function that computes score, a value assigned to a document by the 
learnt model, and classifies the document based on the sign of the following equation: 

Class(X) = Sign(score - 0 ne J) 

15 14. The method according to claim 13, wherein: 

said model is a support vector machine determined by training data from labeled example 
documents. 

15. The method according to claim 13, wherein: 

said model comprises a list of terms and weights extracted from labeled example 
20 documents. 

16. The method according to claim 13, wherein: 

said model comprises a list of terms and corresponding weights and a threshold value 
determined by: 

extracting terms and features from the positive and negative documents; 
25 ranking terms and features; 

selecting a subset of terms and features from the ranked terms and features; 
assigning a weight w, for each term and feature; 
setting a threshold 0 for the model to zero. 
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17The method according to claim 16, wherein: 

said terms and features are ranked in decreasing order of their Rocchio score calculated 
as follows: 



TF iJext : The frequency of feature in the text description of the information need 
TFf The number of occurences of feature in document j 
D R : Positive document set 
D N : Negative document set 

R: the number of positive documents (i.e., the size of D R ) 
N: the number of negative documents 
18The method according to claim 16, wherein: 

said terms and features are assigned a weight as follows: 

w i = Rocchio i • idf i , calculated as: 



TF ifText : The frequency of feature in the text description of the information need 
TFf The number of occurences of feature in document j 
D R : Positive document set 
D N : Negative document set 

R: the number of positive documents (i.e., the size of D R ) 
N: the number of negative documents 

and idfi is calculated as follows: 




in which 




where 
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idfj = log 2 (S/nj) + 1 

where S is the count of documents in the set and n, is the count of the documents in which t 
feature occurs. 
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